Overview

Dataset statistics

Number of variables3
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory23.6 KiB
Average record size in memory24.1 B

Variable types

Numeric2
Categorical1

Alerts

address has a high cardinality: 968 distinct values High cardinality
address is uniformly distributed Uniform
df_index has unique values Unique
statementID has unique values Unique

Reproduction

Analysis started2022-06-01 21:25:32.435117
Analysis finished2022-06-01 21:25:49.435177
Duration17 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6728.444
Minimum34
Maximum13225
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-06-01T22:25:49.493969image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum34
5-th percentile705.75
Q13291.5
median6828
Q310201
95-th percentile12602.5
Maximum13225
Range13191
Interquartile range (IQR)6909.5

Descriptive statistics

Standard deviation3883.28384
Coefficient of variation (CV)0.5771444096
Kurtosis-1.256303895
Mean6728.444
Median Absolute Deviation (MAD)3442.5
Skewness-0.03639327311
Sum6728444
Variance15079893.38
MonotonicityNot monotonic
2022-06-01T22:25:49.590890image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
96991
 
0.1%
9091
 
0.1%
129531
 
0.1%
19261
 
0.1%
13541
 
0.1%
115991
 
0.1%
95181
 
0.1%
11581
 
0.1%
95501
 
0.1%
21831
 
0.1%
Other values (990)990
99.0%
ValueCountFrequency (%)
341
0.1%
361
0.1%
461
0.1%
501
0.1%
601
0.1%
731
0.1%
771
0.1%
1041
0.1%
1271
0.1%
1411
0.1%
ValueCountFrequency (%)
132251
0.1%
132191
0.1%
132121
0.1%
132111
0.1%
132021
0.1%
131991
0.1%
131921
0.1%
131721
0.1%
131541
0.1%
131501
0.1%

statementID
Real number (ℝ≥0)

UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.338909544 × 1018
Minimum4.805416525 × 1016
Maximum1.842633337 × 1019
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB
2022-06-01T22:25:49.672880image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum4.805416525 × 1016
5-th percentile9.011215325 × 1017
Q14.802867636 × 1018
median9.474764251 × 1018
Q31.404373252 × 1019
95-th percentile1.74850693 × 1019
Maximum1.842633337 × 1019
Range1.83782792 × 1019
Interquartile range (IQR)9.240864884 × 1018

Descriptive statistics

Standard deviation5.317991302 × 1018
Coefficient of variation (CV)0.5694445671
Kurtosis-1.215034819
Mean9.338909544 × 1018
Median Absolute Deviation (MAD)4.607376413 × 1018
Skewness-0.05138643122
Sum9.338909544 × 1021
Variance2.828103149 × 1037
MonotonicityNot monotonic
2022-06-01T22:25:49.750353image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9.120979902 × 10181
 
0.1%
1.288311176 × 10191
 
0.1%
7.693078692 × 10181
 
0.1%
1.956796832 × 10181
 
0.1%
5.222305948 × 10181
 
0.1%
1.113272739 × 10191
 
0.1%
7.511874901 × 10171
 
0.1%
1.082068867 × 10191
 
0.1%
9.418428509 × 10181
 
0.1%
8.549520696 × 10181
 
0.1%
Other values (990)990
99.0%
ValueCountFrequency (%)
4.805416525 × 10161
0.1%
5.683279896 × 10161
0.1%
6.903824493 × 10161
0.1%
7.63257287 × 10161
0.1%
7.791625436 × 10161
0.1%
9.287754387 × 10161
0.1%
1.116827498 × 10171
0.1%
1.126026089 × 10171
0.1%
1.389205943 × 10171
0.1%
1.419158908 × 10171
0.1%
ValueCountFrequency (%)
1.842633337 × 10191
0.1%
1.842188255 × 10191
0.1%
1.839321107 × 10191
0.1%
1.838407239 × 10191
0.1%
1.837605231 × 10191
0.1%
1.836613845 × 10191
0.1%
1.835866244 × 10191
0.1%
1.833474244 × 10191
0.1%
1.832524726 × 10191
0.1%
1.831628691 × 10191
0.1%

address
Categorical

HIGH CARDINALITY
UNIFORM

Distinct968
Distinct (%)96.8%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Kemp House, 160 City Road, London, EC1V 2NX
 
7
20-22 Wenlock Road, London, N1 7GU
 
6
20-22, Wenlock Road, London, N1 7GU
 
6
71-75 Shelton Street, Covent Garden, London, WC2H 9JQ
 
4
71-75 Shelton Street, London, Greater London, WC2H 9JQ
 
4
Other values (963)
973 

Length

Max length102
Median length77
Mean length50.695
Min length26

Characters and Unicode

Total characters50695
Distinct characters74
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique955 ?
Unique (%)95.5%

Sample

1st row113, Ellisland, Kirkintilloch, Scotland, G66 2UA
2nd row2, Queens Quay, Belfast, Co Antrim, BT3 9QQ
3rd rowRedbridge Farm, Dolmans Hill, Lytchett Matravers, Poole, Dorset, BH16 6HP
4th rowChurton House Chester Road, Churton, Chester, CH3 6LA
5th row122 Sunnymead, Peterborough, PE4 5BZ

Common Values

ValueCountFrequency (%)
Kemp House, 160 City Road, London, EC1V 2NX7
 
0.7%
20-22 Wenlock Road, London, N1 7GU6
 
0.6%
20-22, Wenlock Road, London, N1 7GU6
 
0.6%
71-75 Shelton Street, Covent Garden, London, WC2H 9JQ4
 
0.4%
71-75 Shelton Street, London, Greater London, WC2H 9JQ4
 
0.4%
Chase Business Centre, 39-41 Chase Side, London, N14 5BP3
 
0.3%
27, Old Gloucester Street, London, WC1N 3AX3
 
0.3%
85 Great Portland Street, First Floor, London, W1W 7LT2
 
0.2%
3 Crewe Road, Sandbach, Cheshire, CW11 4NE2
 
0.2%
130, Old Street, London, EC1V 9BD2
 
0.2%
Other values (958)961
96.1%

Length

2022-06-01T22:25:49.840965image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
road350
 
4.4%
london225
 
2.8%
street161
 
2.0%
house150
 
1.9%
park68
 
0.9%
lane62
 
0.8%
west53
 
0.7%
the52
 
0.7%
avenue45
 
0.6%
144
 
0.6%
Other values (3636)6745
84.8%

Most occurring characters

ValueCountFrequency (%)
6957
 
13.7%
e3485
 
6.9%
,3257
 
6.4%
o2823
 
5.6%
r2402
 
4.7%
n2269
 
4.5%
a2224
 
4.4%
t1920
 
3.8%
s1565
 
3.1%
d1491
 
2.9%
Other values (64)22302
44.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter26850
53.0%
Uppercase Letter9036
 
17.8%
Space Separator6957
 
13.7%
Decimal Number4407
 
8.7%
Other Punctuation3321
 
6.6%
Dash Punctuation116
 
0.2%
Open Punctuation4
 
< 0.1%
Close Punctuation4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e3485
13.0%
o2823
10.5%
r2402
8.9%
n2269
 
8.5%
a2224
 
8.3%
t1920
 
7.2%
s1565
 
5.8%
d1491
 
5.6%
i1459
 
5.4%
l1449
 
5.4%
Other values (16)5763
21.5%
Uppercase Letter
ValueCountFrequency (%)
S836
 
9.3%
L693
 
7.7%
R674
 
7.5%
B644
 
7.1%
H614
 
6.8%
C606
 
6.7%
W504
 
5.6%
A456
 
5.0%
E429
 
4.7%
N416
 
4.6%
Other values (16)3164
35.0%
Decimal Number
ValueCountFrequency (%)
1996
22.6%
2649
14.7%
3464
10.5%
4395
 
9.0%
7361
 
8.2%
5360
 
8.2%
6349
 
7.9%
0294
 
6.7%
8281
 
6.4%
9258
 
5.9%
Other Punctuation
ValueCountFrequency (%)
,3257
98.1%
/24
 
0.7%
.24
 
0.7%
&8
 
0.2%
'5
 
0.2%
:3
 
0.1%
Open Punctuation
ValueCountFrequency (%)
(3
75.0%
[1
 
25.0%
Close Punctuation
ValueCountFrequency (%)
)3
75.0%
]1
 
25.0%
Space Separator
ValueCountFrequency (%)
6957
100.0%
Dash Punctuation
ValueCountFrequency (%)
-116
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin35886
70.8%
Common14809
29.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e3485
 
9.7%
o2823
 
7.9%
r2402
 
6.7%
n2269
 
6.3%
a2224
 
6.2%
t1920
 
5.4%
s1565
 
4.4%
d1491
 
4.2%
i1459
 
4.1%
l1449
 
4.0%
Other values (42)14799
41.2%
Common
ValueCountFrequency (%)
6957
47.0%
,3257
22.0%
1996
 
6.7%
2649
 
4.4%
3464
 
3.1%
4395
 
2.7%
7361
 
2.4%
5360
 
2.4%
6349
 
2.4%
0294
 
2.0%
Other values (12)727
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII50695
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6957
 
13.7%
e3485
 
6.9%
,3257
 
6.4%
o2823
 
5.6%
r2402
 
4.7%
n2269
 
4.5%
a2224
 
4.4%
t1920
 
3.8%
s1565
 
3.1%
d1491
 
2.9%
Other values (64)22302
44.0%

Interactions

2022-06-01T22:25:38.630994image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:25:32.536490image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:25:41.160385image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-01T22:25:32.642347image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-01T22:25:49.905097image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-01T22:25:49.959347image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-01T22:25:50.014444image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-06-01T22:25:50.069191image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-06-01T22:25:49.331154image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-01T22:25:49.411354image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexstatementIDaddress
096999120979901932274174113, Ellisland, Kirkintilloch, Scotland, G66 2UA
18626164430792434583181582, Queens Quay, Belfast, Co Antrim, BT3 9QQ
2871312336355866542270999Redbridge Farm, Dolmans Hill, Lytchett Matravers, Poole, Dorset, BH16 6HP
3591313914046920756315847Churton House Chester Road, Churton, Chester, CH3 6LA
4103594149813677626374910122 Sunnymead, Peterborough, PE4 5BZ
537122427637428553505209Pied House Church Road, Twinstead, Sudbury, Suffolk, CO10 7NA
6101511384127822037046404871-75 Shelton Street, London, Greater London, WC2H 9JQ
7408933787788974777764523 Caroline Court, 13 Caroline Street, Birmingham, B3 1TR
8602917069500166715706930161 High Street, Barnet, EN5 5SU
9119116796589355598666233Unit D11 Rivington Court, Walter Leigh Way, Moss Industrial Estate, Leigh, WN7 3PT

Last rows

df_indexstatementIDaddress
990783111058219476647133923555-557, Cranbrook Road, Ilford, London, IG2 6HE
991463843042277929823282114 Lebanon Road, Croydon, CR0 6US
9922690703866488361023888820-22 Wenlock Road, London, N1 7GU
9931917128725119434270415322 Clare Gardens, Barking, IG11 9JH
9941259058980185068813248717, Highbury Hill, London, N5 1SU
995124709448672836645524891White Cross Business Park, South Road, Lancaster, Lancashire, LA1 4XQ
996775415802196399375077567Wallhouse, Mansion, Torpichen, Bathgate, Westlothian, EN48 4NQ
99792031224164980050073317Marquis House, 54 Richmond Road, Twickenham, Middlesex, TW1 3BE
9986403396776612947174998950 Belsize Park, London, NW3 4EE
99970891092202214455851269128, Dunston Hill, Tring, HP23 4AT